Empirical Perturbation Analysis of Two Adversarial Attacks: Black Box versus White Box

نویسندگان

چکیده

Through the addition of humanly imperceptible noise to an image classified as belonging a category ca, targeted adversarial attacks can lead convolutional neural networks (CNNs) classify modified any predefined target class ct≠ca. To achieve better understanding inner workings attacks, this study analyzes images created by two completely opposite against 10 ImageNet-trained CNNs. A total 2×437 are EAtarget,C, black-box evolutionary algorithm (EA), and basic iterative method (BIM), white-box, gradient-based attack. We inspect compare these sets from different perspectives: behavior CNNs at smaller regions, frequency, transferability, texture change, penultimate CNN layer activations. find that change is side effect rather than means for ct-relevant features only build up significantly regions size 56×56 onwards. In layers, both increase activation units positively related ct negatively ca. contrast EAtarget,C’s white nature, BIM predominantly introduces low-frequency noise. affects original ca more thus producing slightly transferable images. However, transferability with low, since attacks’ ct-related information specific output layers CNN. actually sizes full scale.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Delving into Transferable Adversarial Examples and Black-box Attacks

An intriguing property of deep neural networks is the existence of adversarial examples, which can transfer among different architectures. These transferable adversarial examples may severely hinder deep neural network-based applications. Previous works mostly study the transferability using small scale datasets. In this work, we are the first to conduct an extensive study of the transferabilit...

متن کامل

HotFlip: White-Box Adversarial Examples for NLP

Adversarial examples expose vulnerabilities of machine learning models. We propose an efficient method to generate white-box adversarial examples that trick character-level and wordlevel neural models. Our method, HotFlip, relies on an atomic flip operation, which swaps one token for another, based on the gradients of the one-hot input vectors. In experiments on text classification and machine ...

متن کامل

Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models

Many machine learning algorithms are vulnerable to almost imperceptible perturbations of their inputs. So far it was unclear how much risk adversarial perturbations carry for the safety of real-world machine learning applications because most methods used to generate such perturbations rely either on detailed model information (gradient-based attacks) or on confidence scores such as class proba...

متن کامل

Query-Efficient Black-box Adversarial Examples

Current neural network-based image classifiers are susceptible to adversarial examples, even in the black-box setting, where the attacker is limited to query access without access to gradients. Previous methods — substitute networks and coordinate-based finite-difference methods — are either unreliable or query-inefficient, making these methods impractical for certain problems. We introduce a n...

متن کامل

Decision-based Adversarial Attacks: Reliable Attacks against Black-box Machine Learning Models

Many machine learning algorithms are vulnerable to almost imperceptible perturbations of their inputs. So far it was unclear how much risk adversarial perturbations carry for the safety of real-world machine learning applications because most methods used to generate such perturbations rely either on detailed model information (gradient-based attacks) or on confidence scores such as class proba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2022

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app12147339